value vector
- North America > United States > California > Los Angeles County > Glendale (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation
Xu, Hefei, Wu, Le, Cheng, Chen, Liu, Hao
With the rapid advancement of large language models (LLMs), aligning them with human values for safety and ethics has become a critical challenge. This problem is especially challenging when multiple, potentially conflicting human values must be considered and balanced. Although several variants of existing alignment methods (such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO)) have been proposed to address multi-value alignment, they suffer from notable limitations: 1) they are often unstable and inefficient in multi-value optimization; and 2) they fail to effectively handle value conflicts. As a result, these approaches typically struggle to achieve optimal trade-offs when aligning multiple values. To address this challenge, we propose a novel framework called Multi-V alue Alignment (MV A). It mitigates alignment degradation caused by parameter interference among diverse human values by minimizing their mutual information. Furthermore, we propose a value extrapolation strategy to efficiently explore the Pareto frontier, thereby constructing a set of LLMs with diverse value preferences. Extensive experiments demonstrate that MV A consistently outperforms existing baselines in aligning LLMs with multiple human values.
- Asia > China > Anhui Province > Hefei (0.40)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
The Rise of Parameter Specialization for Knowledge Storage in Large Language Models
Hong, Yihuai, Zhao, Yiran, Tang, Wei, Deng, Yang, Rong, Yu, Zhang, Wenxuan
Over time, a growing wave of large language models from various series has been introduced to the community. Researchers are striving to maximize the performance of language models with constrained parameter sizes. However, from a microscopic perspective, there has been limited research on how to better store knowledge in model parameters, particularly within MLPs, to enable more effective utilization of this knowledge by the model. In this work, we analyze twenty publicly available open-source large language models to investigate the relationship between their strong performance and the way knowledge is stored in their corresponding MLP parameters. Our findings reveal that as language models become more advanced and demonstrate stronger knowledge capabilities, their parameters exhibit increased specialization. Specifically, parameters in the MLPs tend to be more focused on encoding similar types of knowledge. We experimentally validate that this specialized distribution of knowledge contributes to improving the efficiency of knowledge utilization in these models. Furthermore, by conducting causal training experiments, we confirm that this specialized knowledge distribution plays a critical role in improving the model's efficiency in leveraging stored knowledge.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (9 more...)
Learned Structure in Cartridges: Keys as Shareable Routers in Self-Studied Representations
A bottleneck for long-context LLM inference is the linearly growing KV cache. Recent work has proposed Cartridges, an approach which leverages offline compute to train a much smaller KV cache than is typically required for a full document (up to 40x less memory usage at inference time). In this paper, we present the first mechanistic exploration of the learned Cartridge key-value cache structure. In particular, we propose that (1) Cartridge keys act as stable, shareable retrieval routers for the compressed corpora and (2) most of the learned compression occurs within the Cartridge value vectors. We present empirical evidence of our routing theory across tasks, model families, and model sizes; for example, we can ablate the learned Cartridge key vectors between tasks with little performance loss. Finally, we propose a slight improvement in initialization called Sampled Chunk Initialization (SCI). We suggest that SCI can lead to faster Cartridge convergence than previously demonstrated in the literature. Our findings lay the groundwork for broader empirical study of Cartridge training optimization which may be crucial for further scaling.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
A Transformer Inspired AI-based MIMO receiver
Rácz, András, Borsos, Tamás, Veres, András, Csala, Benedek
Abstract--We present AttDet, a Transformer-inspired MIMO (Multiple Input Multiple Output) detection method that treats each transmit layer as a token and learns inter-stream interference via a lightweight self-attention mechanism. Queries and keys are derived directly from the estimated channel matrix, so attention scores quantify channel correlation. V alues are initialized by matched-filter outputs and iteratively refined. The AttDet design combines model-based interpretability with data-driven flexibility. We demonstrate through link-level simulations under realistic 5G channel models and high-order, mixed QAM modulation and coding schemes, that AttDet can approach near-optimal BER/BLER (Bit Error Rate/Block Error Rate) performance while maintaining predictable, polynomial complexity.
Design Principles for Sequence Models via Coefficient Dynamics
Sieber, Jerome, Orvieto, Antonio, Zeilinger, Melanie N., Alonso, Carmen Amo
Deep sequence models, ranging from Transformers and State Space Models (SSMs) to more recent approaches such as gated linear RNNs, fundamentally compute outputs as linear combinations of past value vectors. To draw insights and systematically compare such architectures, we develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs. This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures and crucially captures softmax attention, on top of RNNs, SSMs, and related models. In contrast to new model proposals that are commonly evaluated on benchmarks, we derive design principles linking architectural choices to model properties. Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention. By connecting several insights and observations from recent literature, the framework both explains empirical successes of recent designs and provides guiding principles for systematically designing new sequence model architectures.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
- North America > United States > California > Los Angeles County > Glendale (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)